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Abstract 


Software  tools  were  designed  to  characterize  the  acoustic  features  of  marine  animal  sounds. 
These  have  resulted  in  a  set  of  calculated  measurements  that  summarize  particular  as¬ 
pects  of  sound  sequences.  The  specihcity  of  these  measurements  was  enhanced  by  adjusting 
calculations  to  compensate  for  ambient  noise.  The  sound  measures  included  statistics  for 
Aggregate  Bandwidth,  Intensity,  Duration,  Amplitude  Modulation,  Frequency  Modulation, 
Short-term  Bandwidth,  Center  Frequency,  and  Amplitude  Frequency  Interaction.  The  ef¬ 
ficacy  of  noise  compensation  was  tested  for  each  statistic.  Then,  the  sound  measures  were 
tested  on  a  subset  of  200  sequences  of  marine  animal  sounds,  including  sequences  from  20 
species:  six  baleen  whales,  13  toothed  species,  and  one  seal.  The  statistics  were  reviewed 
for  each  species  and  a  graphical  comparison  of  all  species  was  generated  using  principal 
components  analysis.  Preliminary  results  confirm  that  such  sounds  can  be  classified  by 
means  of  relatively  simple  statistical  algorithms,  and  we  are  encouraged  to  continue  toward 
a  system  for  automatic  classification  of  marine  animal  sounds. 


1  Introduction 


Marine  animals  produce  a  remarkable  variety  of  sounds  (Watkins  and  Wartzok  1985).  A 
primary  goal  of  the  bioacoustic  program  at  the  Woods  Hole  Oceanographic  Institution 
(WHOI)  has  been  to  parse  this  variation  into  sensible  classes  of  signals.  Marine  mammal 
sounds  in  particular  contain  distinctive  features  associated  with  species  (op.  cit.),  individual 
identity  (Caldwell,  Caldwell  and  Tyack  1990),  and  certain  behaviors  .  These  features  have 
never  been  examined  in  a  broad  context,  comparing  the  sounds  of  a  wide  variety  of  species. 
Do  the  differences  in  these  sound  features  remain  distinctive  as  the  scope  of  comparison 
widens?  With  our  own  ears,  we  can  often  distinguish  acoustic  features  that  appeatr  to 
be  species-specific,  and  sometimes  features  unique  to  individual  animals;  can  we  specify 
numerical  algorithms  that  objectively  recognize  these  distinctions? 

The  logistic  requirements  for  addressing  these  questions  have  been  formidable.  To  quan¬ 
tify  the  interspecific  and  intraspecific  variability  in  marine  animal  sounds,  a  large  number 
of  sounds  must  be  analyzed  for  each  individual  or  species  to  be  differentiated.  Many  biolog¬ 
ical  and  environmental  attributes  potentially  explain  acoustic  variability.  Therefore,  these 
numeric  results  had  to  be  referenced:  species,  population,  group,  social  context,  behavior, 
activity,  individual  identity,  sex,  reproductive  situation,  age,  season,  geographic  location, 
water  depth,  and  sound  propagation.  Thus,  a  necessary  resource  for  such  acoustic  distinc¬ 
tions  has  been  a  system  for  integrating  the  sound  sequences  with  associated  biological  and 
environmental  data. 

The  SOUND  database  system  organized  for  marine  animal  sounds  (Watkins,  Fristrup, 
and  Daher  1991)  has  provided  this  resource.  The  databases  and  associated  files  contain 
thousands  of  digitized  sound  segments  spanning  more  than  seventy  species  recorded  from 
all  the  world’s  oceans.  The  database  describes  the  time,  geographic  location,  recording 
conditions,  identity  of  the  animal(s)  producing  the  sounds,  the  behavioral  observations 
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associated  with  sound  production,  etc.  These  SOUND  databases  represent  years  of  work  by 
several  people,  and  the  analyses  reported  here  depend  on  the  availability  of  such.  In  turn, 
these  analyses  complemented  and  extended  the  capabilities  of  the  database.  New  relational 
database  structures  have  been  implemented  to  permit  flexible  and  convenient  integration  of 
these  statistical  results  with  the  biological  and  environmental  information  about  the  sounds. 

The  quantification  of  time-frequency  characters  of  the  animal  sounds  for  these  analytic 
distinctions  has  had  no  precedent  on  this  scale.  No  prior  work  has  dealt  with  so  many 
species  and  such  a  variety  of  repertoires  from  individual  animals.  The  WHOI  studies  of 
marine  animal  acoustics,  which  have  continued  since  William  £.  Schevill’s  work  in  the  late 
1940’s,  have  provided  the  heuristic  basis  for  these  statistical  decisions.  We  have  learned 
to  utilize  many  different  acoustic  features  to  describe  and  diagnose  sounds.  As  a  first  step 
toward  the  development  of  an  automatic,  non-sub jective  system  for  separating  the  different 
animal  sound  sequences,  we  have  devised  statistical  measures  to  recognize  familiar  acoustic 
features. 

This  report  describes  the  numerical  procedures  that  have  been  used,  and  it  demonstrates 
their  effectiveness  with  a  trial  set  of  200  digitized  sequences  of  marine  mammal  sounds. 
These  preliminary  results  suggest  that  the  gross  acoustic  features  we  analyzed  can  be  useful 
indicators  of  species  identity,  and  that  with  refinement  they  might  provide  the  basis  for 
finer  distinctions. 
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2  The  Statistics 


Our  statistical  estimation  techniques  were  based  on  our  experience  with  the  marine  animal 
sounds  in  our  tape  library.  We  were  guided  by  the  following  criteria: 

•  Each  statistic  was  designed  to  emphasize  particular  parameters  of  animal  sounds  that 
we  recognized  as  important  for  distinguishing  species. 

•  Each  statistic  had  to  be  insensitive  to  sound  artifacts  introduced  by  propagation  in 
the  ocean  (multipath,  fading,  frequency-dependent  attenuation,  etc.). 

•  Most  statistics  needed  to  be  relatively  insensitive  to  noise  and  assumed  a  minimum 
of  15  dB  signal/noise. 

•  Most  statistics  had  to  be  insusceptible  to  the  shape  (relative  frequency  emphases)  of 
the  ambient  noise  power  spectra. 

•  Most  statistics  needed  to  be  related  to  obvious  features  in  the  time-frequency  analysis 
displays  of  these  sounds  (duration,  frequency  range,  etc.)  -  so  we  could  recognize  the 
effectiveness  of  the  statistics  in  making  the  discriminations. 

These  criteria  reflected  our  interest  in  discriminating  among  the  animal  sounds  rather 
than  making  selections  that  were  largely  controlled  by  differences  in  the  ambient  back¬ 
grounds.  The  choice  of  criteria  did  not  take  into  consideration  changes  to  the  sounds 
contributed  by  the  orientation  and  movements  of  sound  sources.  A  number  of  other  effects 
also  have  not  been  addressed,  including  means  for  dealing  with  frequency-dependent  atten¬ 
uation.  Statistics  that  obviously  would  be  sensitive  to  distortion  of  phase  information  have 
been  avoided  in  these  analyses. 

The  basic  unit  of  data  used  for  our  feature  extraction  programs  was  one  FFT  (Fast 
Fourier  TYansform)  block.  For  most  files,  this  was  256  sample  points,  but  for  very  short  files 
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(low  sampling  rates)  the  FFT  size  was  decreased  to  obtain  no  fewer  than  16  blocks.  Adjacent 
blocks  did  not  overlap.  A  constant  was  subtracted  from  each  sample  point  such  that  the 
mean  for  each  block  became  zero,  and  the  data  were  then  tapered  with  a  Hamming  window. 
These  choices  eliminated  explicit  correlation  between  adjacent  blocks  and  smoothed  the 
resulting  power  spectra;  the  cost  of  this  was  a  reduction  in  the  degrees  of  freedom  for  our 
analyses. 

The  noise  compensation  technique  be^ns  by  estimating  the  power  spectrum  of  sounds 
that  were  present  throughout  the  sound  cut.  To  identify  blocks  of  data  containing  only  noise 
energy,  intensity  measures  were  computed  for  up  to  600  blocks  of  data  distributed  evenly 
through  a  sound  cut.  Blocks  were  sorted  by  intensity,  and  the  blocks  between  the  fifth  and 
tenth  percentiles  in  level  were  used  to  form  a  noise  power  spectrum.  We  eliminated  the 
bottom  five  percent  to  avoid  using  atypically  quiet  sections  (tape  dropout,  etc.).  During 
subsequent  processing  of  these  data,  a  multiple  (currently  6.67x)  of  this  noise  spectrum  was 
subtracted  from  each  block’s  power  spectrum  (negative  values  set  to  zero).  All  spectral 
statistics  were  computed  from  this  reduced  power  spectrum.  To  obtain  the  amplitude 
estimate  for  the  block,  the  adjusted  spectrum  values  were  summed;  this  indirect  method  of 
computing  amplitude,  which  exploits  ParsevaJ’s  relation  (Oppenheim  and  Schafer  1989,  p. 
574),  prevents  loud  noise  components  from  dominating  the  amplitude  statistics. 

2.1  Abbreviations  in  Statistical  Formulae 

•  t,-,  time  in  seconds:  the  interval  from  the  be^nning  of  the  sound  cut  to  the  beginning 
of  the  FFT  block. 

•  ssi,  relative  intensity  in  arbitrary  units:  sum  of  the  adjusted  power  spectrum  values 
for  the  block. 

•  mmj  =  mtn(5Sj_i,5s,):  the  smaller  of  two  adjacent  ss  values. 
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•  Fso  in  Hertz:  the  frequency  that  bisects  the  area  under  the  power  spectrum  density. 

•  Nso  in  Hertz:  the  minimum  number  of  frequency  bins  required  to  accumulate  fifty 
percent  of  the  total  signal  energy. 

•  Fii  in  Hertz:  the  highest  frequency  encountered  when  calculating  Nso‘ 

•  J2S  in  Hertz:  the  lowest  frequency  encountered  when  calculating  Nso- 

Considerable  use  of  symbols  could  not  be  avoided,  but  wherever  possible  we  have  used 
descriptive  terms  for  ease  of  interpretation. 
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Table  1:  Summary  of  Statistics 


Equation 

Page 

Duration:  Results  in  Table  2 

Total  Duration 

2 

8 

Sound  Concentration 

3 

8 

Amplitude  Modulation:  Results  in 

Tkble  S 

Amplitude  Mean 

4 

9 

Amplitude  Standard  Deviation 

5 

9 

Attack  Fraction 

6 

9 

Attack  Proportion 

7 

10 

Amplitude  Skewness 

8 

10 

Frequency  Modulation:  Results  in 

Table  4 

Upsweep  Mean 

9 

10 

Upsweep  Fraction 

10 

11 

Upsweep  Proportion 

11 

11 

Time  Frequency  Correlation 

12 

11 

Time  Upsweep  Correlation 

13 

12 

Short-term  Bandwidth:  Results  in 

Table  5 

Short-term  Bandwidth  Mean 

14 

12 

Short-term  Spectral  Concentration 

15 

13 

Short-term  Spectral  Asymmetry 

16 

13 

Aggregate  Bandwidth:  Results  in  ' 

Table  6 

Total  Upper  Frequency  -  /Vs 

5,  13 

Total  Lower  Frequency  -  /Vs 

5,  13 

Total  Spectrum  Concentration  -  Nso 

5,  13 

Modal  Upper  Frequency  -  /Vs 

5,  13 

Modal  Lower  Frequency  -  F2S 

5,  13 

Modal  Spectrum  Concentration  -  N^o 

5,  13 

Center  FVequency:  Results  in  Table  7 

Median  FVequency  Mean 

17 

14 

Tucal  Spectrum  Median  FVequency 

Modal  Spectrum  Median  FVequency 

Amplitude  Frequency  Interaction: 

Results  in 

table  8 

Amplitude  FVequency  Correlation 

18 

14 

AmpUtude  Upsweep  Correlation 

19 

14 

Intensity  measures  were  computed  for  each  block  of  data.  These  values  were  used  as 
weights  to  find  the  “center”  of  the  sound,  defined  as  the  weighted  average  of  the  time  values 
(eq.  1).  This  statistic  is  only  used  as  a  reference  point  for  subsequent  calculaticms  of  Sound 
Duration  (eq.  2)  and  Amplitude  Skewness  (eq.  8). 


•  Signal  Center:  weighted  mean  of  t,  ss  as  weights. 


2.2  Sound  Duration 


(1) 


The  Sound  Duration  was  computed  by  Equation  2.  It  yielded  a  gross  estimate  of  total 
duration,  including  any  intervals  of  silence  between  sound  elements. 


•  Sound  Duration:  weighted  standard  deviation  of  t,  ss  as  weights. 


4<Tt  =  4, 


(2) 


The  Sound  Concentration  was  computed  by  E^iuation  3.  It  yielded  an  estimate  of 
duration  that  would  result  if  the  sound  were  “packed”  such  that  all  silent  sections  were 
removed.  It  responds  only  to  the  relative  amplitudes  of  different  blocks,  and  it  is  insensitive 
to  their  ordering  in  the  sound. 


•  Sound  Concentration:  equivalent  statistical  bandwidth  of  the  amplitude  values. 


The  ratio  of  these  two  duration  estimates  can  be  used  to  measure  duty  cycle. 
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2.3  Amplitude  Modulation 

A  reference  value  was  computed  for  the  average  level  of  the  sound:  the  Amplitude  Mean 
was  computed  by  Equation  4. 


•  Amplitude  Mean:  average  ss  value. 

33=  (4) 

The  Amplitude  Standard  Deviation  (Eq.  5)  can  be  used  to  measure  the  average  mag¬ 
nitude  of  amplitude  modulation.  However,  Amplitude  Mean  and  Amplitude  Standard  De¬ 
viation  are  scaled  arbitrarily  by  idiosyncrasies  of  the  digitizing  process.  To  form  a  useful 
diagnostic,  they  must  be  used  together  to  form  a  scale-independent  measure  like  the  coeffi¬ 
cient  of  variation  (Eq.  5/  Eq.  4). 

•  Amplitude  Standard  Deviation:  standard  deviation  of  ss  values. 


N  ~\N) 


The  occurrence  and  magnitude  of  sections  of  sound  with  increasing  and  decreasing 
amplitudes  was  measured  by  assessing  the  fraction  of  blocks  in  which  a  subsequent  block  had 
a  larger  (or  lower)  amplitude  than  the  current  block.  This  Attack  Fraction  was  computed 
by  Equation  6. 


•  Attack  Fraction:  the  fraction  of  blocks  in  which  subsequent  block  has  a  larger  ss  value 


than  the  current  block. 


>«», 

N 
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The  proportion  of  average  changes  in  amplitude  to  the  sum  of  the  average  increases 
and  decreases  in  amplitude  values  was  calculated  and  called  the  Attack  Proportion.  These 
statistics  (Eq.  6  and  Eq.  7)  were  similar  in  function  to  the  “attack”  and  “decay”  terms 
used  to  refer  to  the  initial  and  terminal  amplitude  modulation  of  individual  notes  in  nusic. 
The  attack  proportion  was  computed  by  Equation  7. 

•  Attack  Proportion:  the  proportion  of  average  increase  in  ss  values  relative  to  the  sum 
of  average  increases  and  decreases  in  55  values. 

52  ^ 

A«Si>0 

Gross  asymmetry  in  the  amplitude  modulation  relative  to  the  “center”  of  the  sound  was 
weighted  by  a  function  of  time  and  amplitude.  This  Amplitude  Skewness  was  computed  by 
Equation  8. 


•  Amplitude  Skewness:  weighted  skewness  of  t,  ss  as  the  weights. 


Ei.o 


'  +  2P 


.»»• 


(8) 


2.4  fVequency  Modulation 

The  frequency  modulation  of  a  sound  was  expressed  by  differences  between  median  frequen¬ 
cies  of  adjacent  power  spectrum  estimates.  The  average  upsweep  trends  (downsweep  is  a 
negative  upsweep)  for  entire  sound  cuts  were  calculated  by  comparing  weighted  averages  of 
the  change  in  frequency  and  amplitude  values,  giving  the  Upsweep  Mean  using  Ek^uation  9. 


•  Upsweep  Mean:  weighted  average  of  the  change  in  Fio  values,  mm  as  weights. 

mmiAFioi 
53,^1  mmi 


(9) 


10 


Estimates  of  the  relative  occurrence  of  frequency  modulation  were  calculated  by  deter¬ 
mining  the  fraction  of  total  energy  that  coincided  with  increases  in  frequency.  This  Upsweep 
iYaction  was  computed  by  Equation  10. 


•  Upsweep  lYaction:  fraction  of  summed  mm  values  that  coincide  with  increases  in  F50. 


53 

mmi 


(10) 


The  relative  magnitudes  of  frequency  upsweeps  and  downsweeps  were  calculated  as  the 
Upsweep  Proportion,  using  Equation  11. 


•  Upsweep  Proportion:  proportion  of  average  weighted  increase  in  F50  to  the  sum  of 
the  weighted  average  increases  and  decreases  in  Fso. 


53  mmiAFsoi 

AF50.>0 _ 

57  mmj 

_ Afioi>0 _ 

53  mmiAFso,  5Z 

^FjOt  >0 _ AFio,<0 

57  mmj  ^  mmj 


(11) 


The  magnitudes  of  linear  relationships  between  time  and  median  frequency  were  es¬ 
timated  by  calculating  correlation  coefficients.  This  correlation  used  intensity  values  as 
weights  to  focus  on  portions  of  the  sound  cuts  with  loud  signals.  This  was  computed  as  a 
Time  Frequency  Correlation  by  Equation  12. 


•  Time  Frequency  Correlation:  weighted  correlation  between  F^o  and  t,  ss  as  weights. 


E/V 

■»0  **' 


—  tFso 


Otap 


(12) 
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where 


OF  = 


The  magnitudes  of  linear  relationships  between  time  and  frequency  upsweep  were  also 
estimated  by  a  weighted  correlation  coefficient.  This  Time  Upsweep  Correlation  was  com¬ 
puted  by  Equation  13. 


•  Time  Upsweep  Correlation:  weighted  correlation  between  AF50  and  t,  mm  as  weights. 


Where 


- - tF^ 


jSJL. 


TtTF 


I  rnniiti  ^  YJLy  mmiAFsoi 

»^50=  _jv 
Et=i  nimj  zJtssi 


(13) 


Tt  = 


2.5  Short-term  Bandwidth 


-i\TF^ 


1:2,,  j 


Two  measures  of  bandwidth  were  computed  for  each  block.  The  gross  spread  in  power 
spectral  values  (frequency)  was  calculated  to  give  the  Short-term  Bandwidth  Mean  by 
Equation  14. 

•  Short-term  Bandwidth  Mean:  weighted  average  of  F75  —  Fih,  as  weights. 

~  ■^25i) 

The  effective  number  of  frequency  bins  in  each  block  containing  significant  power  levels 
(ignoring  gaps  in  the  sound)  also  was  computed  to  give  the  Short-Term  Spectral  Concen¬ 
tration  by  Equation  15. 
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•  Short-term  Spectral  Concentration:  weighted  average  of  jVso,  ss  as  weights. 


^t=0 


(15) 


Then,  the  relative  emphasis  of  sideband  energy  on  either  side  of  the  dominant  frequency 
was  estimated  by  calculating  the  Short-term  Spectral  Asymmetry  using  £k}nation  16. 


•  Short-term  Spectral  Asymmetry:  weighted  average  spectral  asymmetry,  with  ss  as 
weights. 

2.6  Aggregate  Bandwidth 


Two  aggregate  power  spectra  were  computed  for  each  sound  cut:  the  Total  Spectrum  was 
the  average  of  all  FFT  power  spectra  for  the  sound,  and  the  Modal  Spectrum  accumulated 
only  the  power  spectrum  magnitudes  for  the  frequency  bin  with  the  largest  value  in  each 
FFT  block.  Both  aggregate  spectra  were  processed  to  extract  three  statistics  related  to 
bandwidth.  The  Upper  (F75)  and  Lower  (Fjs)  FVequencies  estimated  the  bounds  of  the 
aggregate  spectra.  These  could  be  used  to  compute  a  bandwidth  spanning  any  gaps  in  the 
spectral  density.  The  Spectrum  Concentration  (A^so)  provided  an  estimate  of  a  “packed”  or 
gap-free  bandwidth. 


2.7  Center  FVequency 


Three  statistics  were  used  to  estimate  the  aggregate  center  frequency,  or  “average”  fre¬ 
quency,  of  the  entire  sound  cut.  They  were  the  Fso  values  computed  for  the  Total  and 
Modal  Spectra,  and  a  weighted  average  of  the  instantaneous  F50  values  computed  by  Equa¬ 
tion  17.  These  statistic  produced  very  similar  values  in  most  instances. 
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•  Median  IVequency  Mean:  weighted  average  of  fso,  ss  as  weights. 


/so  = 


2.8  Amplitude-Frequency  Interaction 


(17) 


The  strength  of  a  linear  trend  between  amplitude  and  center  frequency  was  computed  as 
the  Amplitude-IYequency  Correlation  by  Equaticm  18. 


•  Amplitude  Frequency  Correlation:  correlation  between  ss  and  Fso. 


^1=0  f'sOt  ss  2,-0  f'sOi  /- 1  o  \ 

I  _ 

The  magnitude  of  a  linear  trend  between  amplitude  and  frequency  modulation  was 
calculated  as  the  Amplitude  Upsweep  Correlation,  using  Equation  19. 

•  Amplitude  Upsweep  Correlation:  correlation  between  mm  and  AF^. 


where 


mm  = 


E-li  mm.AFso,  -  mm  A/50, 

-  (EJLi  APm.)" 

22,^1  mmj  _  f^Urn^  _ , 

=  =  V  ^ 


(19) 
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3  Noise  Performance 


To  explore  the  effectiveness  of  our  system  of  noise  compensation,  test  sounds  were  copied 
and  contaminated  with  increasing  levels  of  noise.  These  test  files  were  processed  to  generate 
plots  of  each  statistic  relative  to  signal/noise  ratios.  The  differences  between  adjacent  points 
along  each  curve  reflected  estimation  error,  because  adjacent  points  represented  different 
noise  sequences  with  similar  signal /noise  ratios.  Any  overall  trend  in  these  plots  reflected 
imperfect  noise  compensation. 

Against  Gaussian  white  noise,  these  noise  tests  were  successful.  However,  the  density 
of  this  type  of  noise  was  favorable  for  other  noise  compensation  algorithms  as  well,  and 
we  were  reminded  that  noise  backgrounds  in  the  ocean  were  rarely  “white”  (equal  energy 
at  all  frequencies).  Therefore,  noise  was  generated  synthetically  to  resemble  the  ambient 
background  for  many  of  our  recordings  at  sea  using  MATLAB  software  (The  Math  Works, 
Inc.).  Parametric  spectral  estimation  procedures  extracted  parameters  for  a  sixth  order 
autoregressive  (AR)  process  from  an  ambient  noise  sample.  Noise  sequences  were  generated 
by  filtering  white  noise  with  a  finite  impulse  response  (FIR)  filter  constructed  from  the  AR 
estimates. 

The  sound  statistics  were  tested  against  noise  and  plotted  in  Figures  1  through  27  to 
provide  an  indication  of  performance  for  each  statistic.  In  these  figures,  the  horizontal  axis 
represented  signal /noise  ratios,  proceeding  from  low  to  relatively  high  values.  Note  that 
most  of  our  marine  mammal  sound  sequences  exceeded  15  dB  signal/noise.  The  vertical 
axes  represented  the  estimated  numeric  level  for  this  statistic.  The  variance  may  be  seen  in 
the  relative  amplitude  fluctuations  on  the  vertical  axes,  and  the  trend  for  this  statistic  may 
be  seen  in  the  relative  changes  in  the  progression  from  lower  to  higher  signal/noise  along 
horizontal  axes. 

The  performance  of  each  of  the  statistics  relative  to  noise  was  summarized  in  the  table 


15 


below,  with  the  plots  subjectively  classified  as  follows. 


Table  of  Noise  Perfonnance  for  the  Statistics 


Low  Variance 


Signal  Center  (Fig.  1) 

Median  Frequency  Mean  (Fig.  25) 
Upaweep  Mean  (Fig.  9) 

Time  Frequency  Correlation  (Fig.  12) 
Time  Upsweep  Correlation  (Fig.  13) 
Upsweep  Fraction  (Fig.  10) 

Total  Spectrum  Median  (Fig.  23) 
Modal  Lower  FVeqnency  (Fig.  21) 
Modal  Spectrum  Median  (Fig.  24) 
Modal  Upper  Frequency  (Fig.  20) 


Sound  Duration  (Fig.  2) 

Sound  Concentration  (Fig.  3 
Amplitude  Mean  (Fig.  4) 

Amplitude  Standard  Deviation  (Fig.  5) 
Short-term  Bandwidth  Mean  (Fig.  14) 
Short-term  Spectral  Concentration  (Fig.  15) 
Total  Upper  Frequency  (Fig.  17) 

Total  Lower  Frequency  (Fig.  18) 

Total  Spectrum  Concentration  (Fig.  19) 
Modal  Spectrum  Concentration  (Fig.  22) 


High  Variance _ 


Attack  Fraction  (Fig.  6) 

Attack  Proportion  (Fig.  7) 

Upsweep  Proportion  (Fig.  11) 

Short-term  Spectral  As3rmmmetry  (Fig.  16) 
Amplitude  Skewness  (Fig.  8) 


Amplitude  Frequency  Correlation  (Fig.  26) 
Amplitude  Upsweep  Correlation  (Fig.  27) 


Generally,  the  statistics  performed  well  in  noise.  Higher  order  statistics  (standard  devia¬ 
tions,  correlations)  were  less  consistent,  and  the  least  useful  were  those  statistics  measuring 
frequency-amplitude  relations  and  gross  asymmetry  in  the  sound  waveform  envelope. 
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4  Preliminary  Analysis  of  Marine  Animal  Sounds 

A  subset  of  approximately  200  sequences  of  marine  mammal  sounds  were  used  to  test  these 
statistics.  Sounds  were  selected  without  attention  to  any  particular  acoustic  features.  They 
included  sequences  from  20  species:  six  baleen  whales,  13  toothed  whales  and  dolphins,  and 
one  seal.  A  wide  variety  of  sound  types  was  included  in  this  subset;  we  also  included  sounds 
from  pairs  of  species  that  were  difficult  to  distinguish  aurally. 

4.1  Statistical  Interdependence 

The  redundancy  in  these  statistics,  for  this  data  set,  was  examined  by  a  stepwise  multiple 
regression  procedure.  This  analysis  treated  each  sound  equally,  ignoring  the  identity  of 
the  sequence.  At  each  stage,  the  algorithm  identified  the  statistic  that  had  the  highest 
linear  correlation  with  other  statistics  for  these  data.  This  statistic  was  removed,  and  the 
analysis  repeated.  When  the  correlation  coefficients  and  scatter  plots  indicated  relatively 
poor  fits,  the  analysis  was  terminated.  We  anticipated  some  redundancy  in  our  statistics; 
we  intended  to  test  alternative  sound  measures.  A  more  conclusive  analysis  of  redundancy 
and  performance  awaits  analysis  of  larger  data  sets. 

For  the  comparisons  described  here,  the  multiple  regression  functions  explained  more 
than  80%  of  the  variance  for  the  first  nine  statistical  estimators.  The  statistics  that  were 
successively  eliminated  were  (with  percent  explained  variance):  Amplitude  Standard  Devi¬ 
ation  (0.979,  fig.  28),  Median  iYequency  Mean  (0.974,  fig.  29),  Modal  Spectrum  Median 
(0.974,  fig.  31),  Total  Upper  Frequency  (0.944,  fig.  31),  Total  Spectrum  Concentration 
(0.900,  fig.  32),  Modal  Lower  Frequency  (0.896,  fig.  33),  Modal  Upper  Frequency  (.0.885, 
fig.  34),  Short-term  Spectral  Concentration  (0.878,  fig.  15),  and  Total  Spectrum  Median 
(0.839,  fig.  36).  In  these  figures,  the  horizontal  axes  represented  the  predicted  value,  the 
vertical  axes  represented  the  observed  value,  and  the  dark  line  represented  the  regression 
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line,  flight  of  the  nine  variables  appear  to  scale  with  center  frequency,  and  we  suspect  that 
re-expressing  these  in  relation  to  center  frequency  would  remove  much  of  this  redundancy. 
Amplitude  Standard  Deviati(ni  and  other  amplitude  variables  sould  be  re-expressed  relative 
to  Amplitude  Mean  for  similar  reasons.  It  was  not  resolved  whether  simple  division  by  these 
scaling  factors  would  be  appropriate. 

4.2  Acoustical  Analyses  and  Biological  Information 

The  ability  to  select  and  analyze  acoustic  measurements  based  on  related  biological  or  envi¬ 
ronmental  observations  was  crucial  for  these  data.  This  could  have  been  done  by  segregating 
data  files  for  different  species,  activities,  locations,  etc.  and  independently  processing  each 
batch.  However,  it  would  have  been  cumbersome  and  difficult  to  manage  such  sorting  and 
data  segregation  for  each  new  query,  especially  as  the  selections  became  more  complicated. 
A  more  powerful  technique  was  to  link  the  numerical  analyses  directly  to  the  text  databases. 
All  sound  cuts  were  processed  in  one  batch,  and  these  extensive  computations  proceeded 
automatically,  unattended.  Interactive  exploration  of  relationships  among  statistics  and 
biological  or  environmental  factors  followed,  with  all  of  the  flexibility  and  convenience  of 
database  queries  and  reports. 

The  SOUND  text  databases  for  the  recordings  and  the  digital  sound  sequences  (Watkins, 
Fristrup,  and  Daher  1991)  could  have  accommodated  new  numeric  data  from  the  statistical 
analyses,  but  with  INMAGIC  software  this  required  restructuring  the  entire  database  each 
time  the  number  of  numeric  fields  changed.  This  was  not  feasible:  the  analyses  required 
many  iterations  and  modifications.  Therefore,  PARADOX  software  (supports  relational 
database  models,  with  visual,  query- by-example  interface)  was  used  to  provide  more  flex¬ 
ible  linkage  between  biological  and  acoustic  information.  The  text  information  from  the 
SOUND  databases  remained  unmodified  as  a  single  table,  and  additional  tables  were  cre¬ 
ated  to  handle  the  numeric  results.  Fields  were  used  in  these  numeric  tables  that  identified 
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the  related  SOUND  text  records.  Then,  subsets  of  the  statistical  results  were  obtained 
by  selecting  particular  fields  in  SOUND  and  reporting  the  linked  numerical  information. 
Note  that  these  queries  could  be  reversed  to  select  pertinent  biological  or  environmental 
information  based  on  acoustic  criteria. 
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4.3  Preliminary  Species  Summaries 

We  generated  summaries  of  the  numerical  results  for  each  of  the  species  in  the  trial  data  set 
(see  Tables  2  through  8).  For  each  species,  the  mean  value  of  the  statistic  was  listed  with 
the  maximum  as  a  superscript  and  the  tniniTnuTn  as  a  subscript.  The  number  of  sounds 
analyzed  (Count)  for  each  species  was  indicated  in  Table  1,  but  not  repeated  in  the  other 
tables.  Tables  were  divided  in  three  sections:  baleen  whales,  toothed  species,  one  seal  and 
a  transient  sequence  (hammer  simulating  clicks  of  Physeter  catodon). 

Two  aspects  of  the  summaries  of  sound  duration  (Table  2)  merit  comment.  Species  that 
are  represented  by  more  than  ten  sounds  showed  a  dramatic  variation  between  upper  and 
lower  bounds  on  both  statistics.  This  may  have  reflected  different  selection  criteria  for  the 
sound  cuts.  These  sound  cuts  may  have  been  a  mix  of  isolated  sounds,  long  sound  sequences 
with  intervals  of  silence,  or  continuous  choruses  from  many  individuals.  All  of  these  are 
valid  data,  but  we  need  to  differentiate  among  these  classes  of  recordings  in  future  analyses. 
Also,  the  sound  “duty  cycle”  could  be  calculated  by  dividing  Sound  Concentration  by  Sound 
Duration.  Table  2  indicates  that  baleen  whales  could  largely  be  distinguished  from  toothed 
species  by  comparison  of  sound  duty  cycles. 
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Table  2 


Species 

Count 

Sound 

Sound 

Duration  s. 

Concentration  s. 

B.  mysticetus 

5 

1.62?^° 

C.  marginata 

3 

.86  J2 

.297:5?? 

E.  glacialis 

5 

.831^7 

•265:066 

E.  australis 

16 

•82  \i* 

.4432iiJ9 

B.  acutorostmta 

5 

.51  127 

•255:521 

B.  physalus 

15 

1-07 

000  ^29 
.100 

P.  caiodon 

28 

4.15 

.057  :o^ 

D.  leucas 

1 

1  17  1.17 
1.17 

Cl  7. 517 
•®^‘  .517 

S.  longirostris 

25 

1.39  2“ 

.089^ 

S.  long.  +  P.  cat. 

4 

1.29  *7|« 

.079  :o23 

S.  bredanensis 

7 

1-722^1 

.230:g|5 

C.  commersonii 

2 

1-46  2^31 

102  *^2 
.061 

D.  delphis 

13 

1.68 

100  1.147 
.10^  020 

G.  griseus 

23 

4.06  *if2 

.414  ?6?r 

G.  macrorhynchus 

11 

1.68 

.403  :og| 

G.  melaena 

11 

1.37 

oco  1.564 

,£QO  039 

0.  area 

7 

.98 

OQO  .921 
.0»0  040 

P.  crassidens 

6 

5-94  3;2| 

470  1.096 
.'*<U  .077 

P.  phoooena 

8 

1.54 

.066 

I.  geoffrensis 

2 

1.31 1“ 

.121 :1?2 

A.  phillipi 

4 

4*43  j;!! 

•670^235® 

Hammer  on  metal 

1 

2.41  ill 

.079:811 
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Table  3  displayed  statistics  related  to  amplitude  modulation.  The  first  two  of  these 
statistics  were  not  useful  diagnostics  in  themselves  because  the  absolute  value  of  each  was 
inherently  tied  to  equipment  gain  settings  during  any  stages  of  processing.  However,  the 
proportion  of  Amplitude  Standard  Deviation /Amplitude  Mean  (standard  error)  was  a  useful 
indication  of  amplitude  modulation,  with  larger  standard  errors  indicating  more  modulation. 

Attack  Fraction  and  Attack  Proportion  were  negatively  correlated  because  sound  cuts 
were  edited  so  that  initial  and  terminal  noise  levels  were  approximately  the  same.  One  of 
these  probably  would  have  been  sufRcient. 

Amplitude  Skewness  appeared  to  be  a  less  robust  statistic.  However,  some  gross  dif¬ 
ferences  agreed  with  our  experience.  Among  baleen  whales,  Eubalaena  glacialis  tended  to 
start  loudly  and  taper  to  silence  (negative  skew),  while  B.  acutorostmta  often  started  softly 
and  swelled  in  level. 
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Table  3 


Species 

Amplitude 

Amplitude 

Attack 

Attack 

Amplitude 

Mean 

StdDev 

IVaction 

Proportion 

Skewness 

B.  mysticetus 

.095;gf| 

•336;§5f 

.51  ;49 

^9ll 

C.  marginata 

.161 

.291 

.44  J| 

.56 :44 

-.140  f, '^9 

E.  glacialis 

.195 

.501 'otf 

•>1 

.45 

.49  :g 

—1.127  l^ffieo 

E.  australis 

1.254*615' 

2.263  ?i8f 

•50:88 

.50;!? 

_  000  2.658 
—  .444  _i,39i 

B.  acutorostmta 

19.177 

35.734 18:505 

•45:8! 

.55;44 

1.123  ?339® 

B.  physalus 

1.612^000^ 

3.383 

•50 :2l 

.56 

.444i-f“, 

P.  catodon 

.077;gJ5 

“442^^5— 

•46 :32 

CA  •** 

.04  ,42 

—.226  :^fo52 

D.  leucas 

•oio:8}§ 

.021 :881 

CO  .52 
•0^  .52 

.48:51 

.678;8?| 

S.  longirostris 

.143*688^ 

•4275i88° 

CO  .56 
•OU  .45 

.50:55 

101  4.023 
.141  _3.049 

S.  long.  +  P.  cat. 

•010:84 

•046 :8g! 

e,  .53 
•^^  .48 

4Q  *2 
.47 

_  200  2.632 

•0»4  _5,964 

S.  bredanensis 

110.383 

.014 

ono  1.023 

.zyo  04g 

.49:|i 

.51  ;59 

.3691-,V«7®5 

C.  commersonii 

•008:8’,! 

.043:888 

.58  il 

.42  ;3| 

-5.200  :6*58o 

D.  delphis 

•126 

.484?68f 

•50:Sg 

.49:18 

_  041  4.272 
-.U41  _4,224 

G.  griseus 

.060 :88f 

•163  :oo7 

4Q  .53 
•^y  .45 

.51  ;*? 

Otfi  2.064 
.U05  _3.067 

G.  macrorhynchus 

OC1  1.079 
.056 

0*10  2.785 
.090 

.50 :46 

.50 :53 

01  7  2.366 
.1^1  <  -1.606 

G.  melaena 

•726 'ilf 

1.653 

.50 :42 

.50:5! 

100  2.065 
•iy4  _2.607 

0.  orca 

OOfi  -451 

.zzo  on 

•384:858 

.50:58 

.50:5) 

.2801-^ 

P.  crassidens 

•084  :oo} 

001  .682 
.006 

•49:58 

c,  .54 
.47 

•030fSl,o 

P.  phocoena 

•171:882 

.944*08?® 

.48:88 

40  .56 
•^*  .09 

-106!.|.*4®15 

I.  geoffrensis 

•217:118 

1.007 'gjf 

43 

•44  ,42 

•57:!i 

_  000  ~-*19 

.5#y4  _i,j66 

A.  phillipi 

.566?, if 

1.359  i'jir 

.5i:5S 

•49;58 

.081 

Hammer  on  metal 

1-4891:518 

10.214 18:214 

•40:58 

_  102  -192 

.iy4  _  ,92 
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Table  4  displays  statistics  relating  to  frequency  modulation.  Baleen  whales  and  toothed 
species  dearly  had  different  magnitudes  of  Upsweep  Mean,  but  much  of  this  could  have 
resulted  from  higher  center  frequendes  of  most  toothed  whale  sounds. 

Upsweep  Inaction  and  Upsweep  Proportion  were  not  always  negatively  correlated,  be¬ 
cause  the  sounds  from  marine  mammals  could  start  or  end  at  very  different  frequendes. 
The  numbers  for  E.  glacialis  were  instructive.  Upsweep  Mean  was  negative  for  this  species, 
indicating  that  the  sounds  had  lower  frequencies  toward  the  end,  and  Upsweep  Fraction 
was  nearly  half  indicating  that  about  half  of  the  block-to-block  changes  in  frequency  were 
positive.  Thus,  most  of  these  downsweeps  were  greater  in  magnitude  than  the  upsweeps, 
and  indeed  the  Upsweep  Proportion  was  less  than  one-half. 

The  Time  Frequency  and  Time  Upsweep  Correlations  showed  considerable  variation 
within  larger  samples.  We  know  that  the  features  they  target  are  useful  diagnostics,  so  we 
must  seek  better  means  of  measuring  them.  Alternatives  indude  non-parametric  measures 
of  correlation. 


24 


Table  4 


Species 

Upsweep 
Mean  Hz/s. 

Upsweep 

Fraction 

Upsweep 

Proportion 

Time-fteq. 

Correlation 

Time-Upsweep 

Correlation 

B.  mystioetus 

—395  L*  149 

.47:11 

isolS 

-■m%4 

C.  marginata 

-144 

-191 

.00  g 

.68 

-27f?45 

•18i?22 

E.  glacialis 

—1320  Lm79 

S4 

.04^2 

.34;g| 

-17f®64 

•10  -5)3 

E.  australis 

10215/2 

.69!^ 

1171.00 
•o<  .20 

•18-^ 

•I6fi*i5 

B.  acutorostrata 

0“ioo 

70.97 

•'^.41 

OR  .50 
•wO  ^09 

-08  33 

-.04 

B.  physalus 

-36“i34 

17 .94 
‘  0.00 

09  .91 

0.00 

-  in  1-0® 

— .lU  -1.0 

-.12  f®98 

P.  caiodon 

45628 

CO  i.oo 
•oo  0.00 

cn  1.00 

0.00 

—  •!«»  -.76 

—.03  ’1%3 

D.  leucas 

-680  Ills 

•10  .18 

.69.11 

QO  *.82 

^  g2 

-.021:82 

S.  longirostris 

172907  l"i®92 

70 1.00 
.‘W  .09 

CQ 

-•00f®89 

•04  f  j2 

S.  long.  +  P.  cat. 

81780  iVw6i9 

.34 ‘.40 

•29:81 

-•4o:;S8 

-  03  f®i3 

S.  bredanensis 

41495 

.3®  iSi 

•45 

-•04f®i6 

-•02f5i3 

C.  commersonii 

-6998:I|So7 

17  *26 
.08 

■3T:if 

•10  *^2 

-  41  -5® 

•^^  -.51 

D.  delphis 

-70028 1V^41 

•38*6?’ 

.38?2?’ 

•003. 

•00f^38 

G.  griseus 

44321/20, 

.33  iai 

•49:18 

04.87 
•'^  -.45 

-.02i®27 

G.  macrorhynchus 

10434  l^y^/i 

Ki  .74 
.33 

•31  .M 

■"•04 

-.01  i.35 

G.  melaena 

2431 

ei  .88 

.oi  .01 

•49 :38 

•01  '^go 

-.08  :3.47 

0.  orca 

—129211  Ig^sa 

48  •’^1 
.09 

.43:1! 

—.05  ;?|4o 

-16?90 

P.  crassidens 

_if5i  1082 
-101  _2131 

cc  .65 

.00  ..,2 

•38:fg 

4r:.79 
•^®  .11 

-.003, 

P.  phoooena 

85501O224/9 

.68^:88 

88 

0 

•02L!& 

-•09i®36 

J.  geoffrensis 

-43940  :|2^ 

•33  ;29 

•34 :53 

-  I'i-'O® 
-.10  _.21 

■04i5,2 

A.  phillipi 

—  1633  _4367 

90 .44 

JO 

-45:^5 

-061*5,2 

.02^5,2 

Hammer  on  metal 

-503277 

01 

•wl  .01 

•20:18 

-•361:11 

4R 

•40  .48 
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The  short-term  bandwidth  statistics  in  Table  5,  the  aggregate  bandwidth  statistics  in 
Thble  6,  and  the  center  frequency  statistics  of  Table  7  were  the  most  diagnostic  for  this  set 
of  sound  sequences.  They  apparently  separated  the  sounds  of  different  species.  Bandwidths 
appeared  to  scale  with  frequency,  suggesting  that  they  could  be  expressed  best  in  proportion 
to  center  frequency. 


Table  5 

Species  Short-term  Short-term  Short-term 


Bandwidth  Spectral  Spectral 

Mean  Hz/s.  Concentration  Hz/s.  Asymmetry 


B.  mysticettis 

3450211^11 

- 

•11  .03 

C.  marginata 

ssitit 

106  aa® 

1 7 

•1/  .08 

E,  glacialis 

7444  If ja" 

335”! 

.  1  R  .36 

-•10  0.00 

E.  australis 

4378  3269 

417680 

41/  265 

•22  oloo 

B.  acutoTosirata 

9071^4* 

102  If 

•206‘Jo 

B.  physalus 

464 11® 

40  f 

•06  0^ 

P.  catodon 

1898178  fiSig® 

135440 

41 

•41  .07 

D,  leucas 

52254  If 

4652  Jill 

10  .13 

•  !«»  13 

S.  longirostris 

1189241  Iffgg* 

102772  tlflf 

10.31 
•1*  .00 

S.  long.  +  P.  cat. 

6005T15 

OOO 1 05H  891902 

221269 

•22;oo 

S.  bredanensis 

3454478  JHelle 

166331 106172 

.22^5 

C.  commersonii 

30991^ 

2471  fl?® 

04  -O® 

•”4  .04 

D.  delphis 

1266464  31111®^ 

102935 

10.32 
•■*'*  .01 

G.  griseus 

508874 

26081  fjgl® 

04  .64 
•^4  .08 

G.  macrorhynchus 

70190c  1932495 
iliZOO  129749 

38673  l?if 

OR  .49 
•ZO  10 

G.  melaena 

327154 

27724 

•10  .05 

0.  area 

214199 

17242 

IQ  .46 
•1».09 
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Species 

B.  mpsiieeius 

C.  marfinata 
E.  glacialis 
E.  auatnliB 

B.  acvtorostrata 

B.  phyaaha 
P.  catodon 

D.  leucaa 

S.  longiroatria 
S.  long,  -f  P.  eat. 
S,  bndanenaia 

C.  commeraonii 

D.  delphia 
G.  griaeua 

G.  macrorhgnchua 
G.  melaena 

O.  orca 

P.  craaaidena 
P.  phocotna 
I.  geoffrtnaia 
A.  phiUipi 
Hammer  on  metal 
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Table  7 


Species 

Median 

Total 

Modal 

FVeqnency 

Spectrum 

Spectrum 

Mean  Hz 

Median  Hz 

Median  Hz 

B.  mysticetus 
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Table  8  displays  statistics  relating  to  amplitade-freqnency  interactions.  Although  rel¬ 
atively  less  robust  than  many  of  the  previous  statistics,  these  measures  also  appeared  to 
be  useful.  For  example,  the  statistics  for  C.  marginata  had  positive  Amplitude  Frequency 
Correlation  suggesting  that  the  higher  frequency  sections  were  louder  than  low  frequency 
ones,  and  the  negative  Amplitude  Correlation  indicated  that  sections  with  downsweeps  (or 
smaller  upsweeps)  tended  to  be  the  loudest. 

Note  that  all  of  these  biological  sounds  except  A.  phillipi  tended  to  have  positive  Am¬ 
plitude  Frequency  Correlations  (higher  frequency  sections  were  louder). 
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Table  8 

Spedes 
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4.4  Principal  Component  Analysis 

To  obtain  a  better  perspective  on  the  overall  distribution  of  sounds  as  measured  by  our 
statistics,  we  performed  a  principal  components  analysis  on  the  numerical  results.  The 
first  two  principal  components  provided  axes  for  scatter  plots  that  expressed  about  half 
of  the  total  variability  in  our  statistics.  As  discussed  earlier,  segregation  of  the  principal 
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component  scores  by  species  was  accomplished  by  linking  to  the  SOUND  table  in  our 
PARADOX  stnictnre.  The  baleen  whales  and  toothed  species  were  plotted  separately  in 
Figures  38  and  39.  These  two  groups  generally  could  be  separated  by  frequency  alone,  and 

I 

the  separate  plots  focus  attention  on  the  variation  within  these  groups.  Each  sound  cut 
is  represented  by  a  colored  symbol  on  the  plot;  color  and  symbol  type  redundantly  code 
species  identity. 

In  both  Figure  38  and  Figure  39,  the  data  for  most  species  tended  to  cluster  in  a  relatively 
discrete  portion  of  the  plot.  Some  regions  were  shared  by  a  few  neighboring  species  with 
similar  sound  types.  These  results  appeared  to  confirm  that  acoustic  features  could  be 
analyzed  so  as  to  compute  sound  statistics  that  would  be  useful  to  classify  an  unknown 
biological  sound  as  one  of  a  few  potential  candidates. 

This  result  was  particularly  remarkable  because  the  sound  data  used  for  the  tests  in¬ 
cluded  both  specific  sounds  produced  by  individuals  and  sounds  produced  by  many  animals, 
with  temporally  overlapping  sounds  by  two  or  more  animals.  If  we  distinguish  between  the 
individual  and  group  recordings,  our  acoustic  classifier  will  perform  much  better. 

A  goal  for  such  analyses  has  been  the  development  of  a  system  for  automatic  diagnosis 
of  marine  animal  sounds  based  on  acoustic  criteria.  The  statistical  problem  for  classification 
of  an  unknown  relative  to  known  groups  is  usually  addressed  by  forming  estimates  of  the 
distance  between  the  unknown  and  the  known,  with  the  most  likely  classification  being 
the  one  that  minimizes  this  distance.  The  distance  measure  that  is  usually  employed  is 
the  Mahalanobis  distance  (e.  g.  Morrison  1976,  p.  241).  Alternatively,  we  may  find  that 
methods  which  make  fewer  assumptions  (Efron  and  Tibshirani  1991)  will  be  better  suited 
to  developing  the  classifier.  Non-acoustic  criteria  also  could  be  incorporated  from  the  text 
databases  for  additional  refinement  of  these  judgements. 
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5  Discussion  and  Summary 


Software  tools  have  been  developed  and  tested  for  statistical  analysis  of  marine  animal 
sounds.  The  preliminary  results  suggest  that  such  sounds  can  be  classified  by  means  of  rel¬ 
atively  simple  statistical  algorithms.  Three  areas  of  this  research  are  planned  for  particular 
attention. 

The  SOUNDC  database  will  need  to  be  modified  to  identify  additional  information  about 
the  sound  cuts.  These  additions  will  include  identification  of  choruses  of  overlapping  sounds 
and  notes  regarding  the  sequential  organization  of  discrete  sound  elenoents.  Sequences  also 
needed  to  be  identified  relative  to  the  usual  pattern  of  sound  production  for  that  species  or 
behavior,  and  noted  if  atypical. 

Continued  evolution  of  our  sound  statistics  is  inevitable.  All  frequency  statistics  should 
be  re-scaled,  such  that  the  other  information  is  not  swamped  by  gross  differences  in  center 
frequency,  for  example.  Some  correlations  may  need  to  be  replaced  with  more  robust 
(perhaps  non-parametric)  alternatives.  Methods  for  expressing  some  of  these  statistics  may 
need  to  be  investigated  to  increase  their  independence  from  each  other. 

A  comprehensive  analysis  of  noise  sensitivity  and  compensation  techniques  is  also  impor¬ 
tant.  The  specificity  of  our  acoustic  measurements  is  improved  by  removing  the  influence 
of  ambient  noise,  but  our  ability  to  classify  could  be  critically  impaired  if  we  erroneously 
discard  portions  of  the  signal. 
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Figure  1.  Noise  performance  of  Signal  Center  (eq.  1).  The  vertical  axis  scores  Signal  Center 
in  seconds;  the  horizontal  axis  scores  signal/noise.  Note  the  small  range  of  values  on  the 
vertical  axis.  We  subjectively  label  this  as  low  variance,  small  trend. 


Signal  Duration  (s.) 


Figure  2.  Noise  performance  of  Signal  Duration  (eq.  2).  The  vertical  axis  scores  Signal 
Duration  in  seconds;  the  horizontal  axis  scores  signal /noise.  We  subjectively  label  this  as 
low  variance,  large  trend. 
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Figure  4.  Noise  performance  of  Amplitude  Mean  (eq.  4).  The  vertical  axis  scores  Amplitude 
Mean  in  arbitrary  units;  the  horizontal  axis  scores  signal/noise.  We  subjectively  label  this 
ais  low  variance,  large  trend. 
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Amplitude  Standard  Deviation 


Estimated  Amplitude  Standard  Deviation  vs.  Signal/Noise 


Figure  5.  Noise  performance  of  Amplitude  Standard  Deviation  (eq,  5).  The  vertical 
axis  scores  Amplitude  Standard  Deviation,  in  arbitrary  units;  the  horizontal  axis  scores 
signal/noise.  We  subjectively  label  this  as  low  variance,  large  trend. 
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Estimated  Attack  Fraction  vs.  Signal/Noise 


Figure  6.  Noise  performance  of  Attack  Fraction  (eq.  6).  The  vertical  axis  scores  Attack 
Fraction  (values  of  0  ♦->  1  are  possible);  the  horizontal  axis  scores  signal/noise.  We  subjec¬ 
tively  label  this  as  high  variance,  small  trend. 
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Figure  7.  Noise  performances  of  Attack  Proportion  (eq.  7).  The  vertical  axis  scores  Attack 
Proportion  (values  of  0  1  are  possible);  the  horizontal  axis  scores  signal/noise.  We 

subjectively  label  this  as  high  variance,  small  trend. 
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1 


Figure  8.  Noise  performance  of  Amplitude  Skewness  (eq.  8).  The  horizontal  axis  scores 
Amplitude  Skewness  (scale  independent);  the  horizontal  axis  scores  signal/noise.  We  sub¬ 
jectively  label  this  as  high  variance,  small  trend. 
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Estimated  Upsweep  Mean  vs.  Signai/Noise 


Figure  9.  Noise  performance  of  Upsweep  Mean  (eq.  9).  The  vertical  axis  scores  Upsweep 
Mean  in  Hz/s.;  the  horizontal  axis  scores  signal/noise.  We  subjectively  label  this  as  low 
variance,  small  trend. 
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Estimated  Upsweep  on  vs.  Signal/Noise 


Figure  10.  Noise  performance  of  Upsweep  Fraction  (eq.  10).  The  vertical  axis  scores 
Upsweep  Fraction  (values  of  0  1  are  possible);  the  horizontal  axis  scores  signal/noise. 

We  subjectively  label  this  as  low  variance,  small  trend. 
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Figure  11.  Noise  performance  of  Upsweep  Proportion  (eq.  11).  The  vertical  axis  scores 
Upsweep  Proportion  (values  of  0  1  are  possible);  the  horizontal  axis  scores  signal/noise. 

We  subjectively  label  this  as  high  variance,  small  trend. 
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Estimated  Time  Frequency  Correlation  vs.  Signal/Noise 


Figure  12.  Noise  performance  of  Time  frequency  Correlation  (eq.  12).  The  vertical  axis 
scores  Time  Frequency  Correlation  coefficients  (values  —  1  1  possible);  the  horizontal  axis 

scores  signal/noise.  We  subjectively  label  this  as  low  variance,  small  trend. 
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Correlation 


S/N  (db) 


Figure  13.  Noise  performances  of  Time  Upsweep  Correlation  (eq.  13).  The  vertical  axis 
scores  Time  Upsweep  Correlation  coefficients  (values  of  — 1  1  possible);  the  horizontal 

axis  scores  signal/noise.  We  subjectively  score  this  as  low  variance,  small  trend. 
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Figure  14.  Noise  performance  of  Short-term  Bandwidth  Mean  (eq.  14).  The  vertical 
axis  is  Short-term  Bandwidth  Mean  in  Hz/s.;  the  horizontal  axis  scores  signal/noise.  We 
subjectively  label  this  as  low  variance,  large  trend. 


49 


Figure  15.  Noise  performance  of  Short-term  Spectral  Concentration.  The  vertical  axis 
scores  Short-term  Spectral  Concentration  in  Hz/s.;  the  horizontal  axis  scores  signal/noise. 
We  subjectively  label  this  as  low  variance,  large  trend. 
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Figure  16.  Noise  performance  of  Short-term  Spectral  Asymmetry  (eq.  16).  The  vertical 
axis  scores  Short-term  Spectral  Asymmetry  (values  of  0  1  are  possible);  the  horizontal 

axis  scores  signal/noise.  We  subjectively  label  this  as  high  variance,  small  trend. 


51 


Figure  17:.  Noise  performance  of  Total  Upper  Frequency  (see  p.  5).  The  vertical  axis  scores 
Total  Upper  Frequency  in  Hz;  the  horizontal  axis  scores  signal/noise.  We  subjectively  lal'el 
this  as  low  variance,  large  trend. 
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Figure  18.  Noise  performance  of  Total  Lower  Frequency  (see  p.  5).  The  vertical  axis  score 
Total  Lower  Frequency  in  Hz;  the  horizontal  axis  scores  signal/noise.  We  subjectively  label 
this  as  low  variance,  large  trend. 


Total  Spectrum  N50  (Hz) 


Figure  19.  Noise  performance  of  Total  Spectrum  Concentration  (see  p.  5).  The  vertical 
axis  scores  Total  Spectrum  Concentration  in  Hz;  the  horizontal  axis  scores  signal/noise.  We 
subjectively  label  this  low  variance,  large  trend. 
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Figure  20.  Noise  performance  of  Modal  Upper  FVequency  (see  p.  5).  The  vertical  axis  scores 
Modal  Upper  Frequency  in  Hz;  the  horizontal  axis  scores  signal/noise.  We  subjectively  label 
this  low  variance,  small  trend. 
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Estimated  Modal  Spectrum  Concentration  vs.  Signal/Noise 


Figure  22.  Noise  performance  of  Modal  Spectrum  Concentration  (see  p.  5).  The  vertical 
axis  scores  Modal  Spectrum  Concentration  in  Hz;  the  horizontal  axis  scores  signal/noise. 
We  subjectively  label  this  low  variance,  small  trend. 
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Figure  23.  Noise  performance  of  Total  Spectrum  Median  (see  p.  5).  The  vertical  axis  scales 
Total  Spectrum  Median  in  Hz;  the  horizontal  axis  scales  signal/noise.  We  subjectively  label 
this  low  variance,  small  trend. 
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Figure  24.  Noise  performance  of  Modal  Spectrum  Median  (see  p.  5).  The  vertical  axis  scales 
Modal  Spectrum  Median  in  Hz;  the  horizontal  axis  scales  signal/noise.  We  subjectively  label 
this  low  variance,  small  trend. 
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Figure  25.  Noise  performance  of  Median  IVequency  Mean  (eq.  17).  The  vertical  axis  scales 
Median  Frequency  Mean  in  Hz;  the  horizontal  axis  scales  signal/noise.  We  subjectively 
label  this  as  low  variance,  small  trend. 


60 


Estimated  Amplitude  Frequency  Correlation  vs.  Signal/Noise 


Figure  26.  Noise  performance  of  Amplitude  Frequency  Correlation  (eq.  18).  The  vertical 
axis  scales  Amplitude  Frequency  Correlation  coefficients  (values  of  -1  1  are  possible); 

the  horizontal  axis  scales  signal/noise.  We  subjectively  label  this  as  high  variance,  large 
trend. 
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Amplitude  Upsweep  Conelatioii 


Estimated  Amplitude  Upsweep  Correlation  vs.  Signal/Noise 


Figure  27.  Noise  performance  of  Amplitude  Upsweep  Correlation  (eq.  19).  The  vertical 
axis  scales  Amplitude  Upsweep  Correlation  coefficients  (values  of  -1  1  are  possible);  the 

horizontal  axis  scales  signal/noise.  We  subjectively  label  this  as  high  variance,  large  trend. 
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predicted 


Figure  28.  Linear  prediction  of  Amplitude  Standard  Deviation  based  on  25  other  variables 
(Signal  Center  not  used  for  any  regressions).  The  axes  are  in  arbitrary  units.  The  percent 
of  variance  explained  by  the  linear  regression  is  displayed  in  the  title. 
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Figure  29.  Linear  prediction  of  Median  Frequency  Mean  based  on  24  other  variables  (Am¬ 
plitude  Standard  Deviation  already  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted 
such  that  the  means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is 
displayed  in  the  title. 
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Figure  30.  Linear  prediction  of  Modal  Spectrum  Median  based  on  23  other  variables  (two 
variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted  such  that  the 
means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is  displayed  in 
the  title. 
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Figure  31.  Linear  prediction  of  Total  Upper  Frequency  based  on  22  other  variables  (three 
variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted  such  that  the 
means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is  displayed  in 
the  title. 
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Figure  32.  Linear  prediction  of  Total  Spectrum  Concentration  based  on  21  other  variables 
(four  variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted  such  that 
the  means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is  displayed 
in  the  title. 
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Figure  33.  Linear  prediction  of  Modal  Lower  Frequency  based  on  20  other  variables  (five 
variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted  such  that  the 
means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is  displayed  in 
the  title. 
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Figure  34.  Linear  prediction  of  Modal  Upper  Frequency  based  on  19  other  variables  (six 
variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted  such  that  the 
means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is  displayed  in 
the  title. 
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Figure  35.  Linear  prediction  of  Short-term  Spectral  Concentration  based  on  18  other  vari¬ 
ables  (seven  variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted 
such  that  the  means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is 
displayed  in  the  title. 
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Figure  36.  Linear  prediction  of  Total  Spectrum  Median  based  on  17  other  variables  (eight 
variables  previously  removed).  The  axes  are  in  Hz,  but  the  data  are  shifted  such  that  the 
means  are  zero.  The  percent  of  variance  explained  by  the  linear  regression  is  displayed  in 
the  title. 
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Figure  37.  Linear  prediction  of  Amplitude  Upsweep  Correlation  based  on  16  other  variables 
(nine  variables  previously  removed).  The  axes  are  correlation  coefficients  (values  -1  1 

possible),  but  the  data  are  shifted  such  that  the  means  are  zero.  The  percent  of  variance 
explained  by  the  linear  regression  is  displayed  in  the  title. 
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Figure  38.  Plot  of  sound  samples  from  six  baleen  whale  species.  The  horizontal  axis 
scales  each  sample’s  score  on  the  first  principal  component  (which  basically  reflects  center 
frequency).  The  vertical  axis  scales  each  sample’s  score  on  the  second  principal  component 
(not  easily  interpreted). 
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Acoustic  Feature  Analysis:  Baleen  Whales 


Figure  39.  Plot  of  sound  samples  from  thirteen  toothed  whale  species.  The  horizontal  axis 
scales  each  sample’s  score  on  the  first  principal  component  (which  basically  reflects  center 
frequency).  The  vertical  axis  scales  each  sample’s  score  on  the  second  principal  component 
(not  easily  interpreted). 
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Acoustic  Feature  Analysis:  Toothed  Whales 
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